21 research outputs found
UG^2: a Video Benchmark for Assessing the Impact of Image Restoration and Enhancement on Automatic Visual Recognition
Advances in image restoration and enhancement techniques have led to
discussion about how such algorithmscan be applied as a pre-processing step to
improve automatic visual recognition. In principle, techniques like deblurring
and super-resolution should yield improvements by de-emphasizing noise and
increasing signal in an input image. But the historically divergent goals of
the computational photography and visual recognition communities have created a
significant need for more work in this direction. To facilitate new research,
we introduce a new benchmark dataset called UG^2, which contains three
difficult real-world scenarios: uncontrolled videos taken by UAVs and manned
gliders, as well as controlled videos taken on the ground. Over 160,000
annotated frames forhundreds of ImageNet classes are available, which are used
for baseline experiments that assess the impact of known and unknown image
artifacts and other conditions on common deep learning-based object
classification approaches. Further, current image restoration and enhancement
techniques are evaluated by determining whether or not theyimprove baseline
classification performance. Results showthat there is plenty of room for
algorithmic innovation, making this dataset a useful tool going forward.Comment: Supplemental material: https://goo.gl/vVM1xe, Dataset:
https://goo.gl/AjA6En, CVPR 2018 Prize Challenge: ug2challenge.or
Meet-in-the-middle: Multi-scale upsampling and matching for cross-resolution face recognition
In this paper, we aim to address the large domain gap between high-resolution
face images, e.g., from professional portrait photography, and low-quality
surveillance images, e.g., from security cameras. Establishing an identity
match between disparate sources like this is a classical surveillance face
identification scenario, which continues to be a challenging problem for modern
face recognition techniques. To that end, we propose a method that combines
face super-resolution, resolution matching, and multi-scale template
accumulation to reliably recognize faces from long-range surveillance footage,
including from low quality sources. The proposed approach does not require
training or fine-tuning on the target dataset of real surveillance images.
Extensive experiments show that our proposed method is able to outperform even
existing methods fine-tuned to the SCFace dataset
Recovery of superquadric parameters from range images using deep learning
With the recent advancements in deep neural computation, we devise a method to recover superquadric parameters from range images using a convolutional neural network. By training our simple, fullyconvolutional architecture on synthetic data images, containing a single superquadric, we achieve encouraging results. In a fixed rotation scenario,
the model could already be used in practice, but we still need to improve on prediction of arbitrary rotational parameters in the future
Recovery of superquadric parameters from range images using deep learning
With the recent advancements in deep neural computation, we devise a method to recover superquadric parameters from range images using a convolutional neural network. By training our simple, fullyconvolutional architecture on synthetic data images, containing a single superquadric, we achieve encouraging results. In a fixed rotation scenario,
the model could already be used in practice, but we still need to improve on prediction of arbitrary rotational parameters in the future
Segmentation and Recovery of Superquadric Models using Convolutional Neural Networks
In this paper we address the problem of representing 3D visual data with
parameterized volumetric shape primitives. Specifically, we present a
(two-stage) approach built around convolutional neural networks (CNNs) capable
of segmenting complex depth scenes into the simpler geometric structures that
can be represented with superquadric models. In the first stage, our approach
uses a Mask RCNN model to identify superquadric-like structures in depth scenes
and then fits superquadric models to the segmented structures using a specially
designed CNN regressor. Using our approach we are able to describe complex
structures with a small number of interpretable parameters. We evaluated the
proposed approach on synthetic as well as real-world depth data and show that
our solution does not only result in competitive performance in comparison to
the state-of-the-art, but is able to decompose scenes into a number of
superquadric models at a fraction of the time required by competing approaches.
We make all data and models used in the paper available from
https://lmi.fe.uni-lj.si/en/research/resources/sq-seg.Comment: 8 pages, in Computer Vision Winter Workshop, 202
EFaR 2023: Efficient Face Recognition Competition
This paper presents the summary of the Efficient Face Recognition Competition
(EFaR) held at the 2023 International Joint Conference on Biometrics (IJCB
2023). The competition received 17 submissions from 6 different teams. To drive
further development of efficient face recognition models, the submitted
solutions are ranked based on a weighted score of the achieved verification
accuracies on a diverse set of benchmarks, as well as the deployability given
by the number of floating-point operations and model size. The evaluation of
submissions is extended to bias, cross-quality, and large-scale recognition
benchmarks. Overall, the paper gives an overview of the achieved performance
values of the submitted solutions as well as a diverse set of baselines. The
submitted solutions use small, efficient network architectures to reduce the
computational cost, some solutions apply model quantization. An outlook on
possible techniques that are underrepresented in current solutions is given as
well.Comment: Accepted at IJCB 202
AUTOMATED FACE RECOGNITION FROM LOW-RESOLUTION IMAGERY
V pričujoči doktorski disertaciji se ukvarjamo s problemom samodejnega
razpoznavanja obrazov iz slik nizke ločljivosti z uporabo metod globokega
učenja. Metode globokega učenja so v zadnjem času dosegle močan preboj
v učinkovitosti delovanja postopkov razpoznavanja obrazov. Globoki nevronski modeli so naučeni za razpoznavanje obrazov na podatkovnih
zbirkah več milijonov slik in so že na podlagi raznolikosti slik v
učnih podatkovnih zbirkah zmožni delovanja v režimih, kot so spremembe
svetlosti, obrazne poze in mimike, za razliko od klasičnih pristopov k razpoznavanju
obrazov, kjer so vplivi takih dejavnikov eksplicitno modelirani.
Kljub preboju z globokim učenjem pa samodejni sistemi za razpoznavanje
obrazov v nekaterih okoliščinah še vedno ne dosegajo človeških sposobnosti.
Ena od takih okoliščin je nizka ločljivost slik obrazov, ki je lahko
rezultat bodisi zajema slik s kamerami nizke kakovosti bodisi razdalje obraza
od kamere. V disertaciji najprej izvedemo sistematično študijo vplivov dejavnikov
kakovosti slik na sposobnost samodejnih sistemov razpoznavanja
obrazov, kjer ugotovimo obstoj močnega vpliva ločljivosti slike na uspešnost
razpoznavanja. Nato razvijemo metodo za izboljšavo kakovosti slik, ki temelji
na novi arhitekturi konvolucijskega nevronskega omrežja za superresolucijo
in novi kriterijski funkciji za superresolucijo obrazov, ki upošteva kakovost
rekonstrukcije in vsebnost informacije o identiteti. V eksperimentih pri
primerjavi s konkurenčnimi modeli za izboljšavo kakovosti obraznih slik ugotovimo,
da ima razvit model boljšo sposobnost rekonstrukcije podrobnosti v
visoki ločljivosti in je bolj uporaben za višjenivojske naloge računalniškega
vida, kot sta razpoznavanje obrazov in lokalizacija ključnih obraznih točk.
Na podlagi razvitega modela izvedemo študijo pristranskosti superresolucijskih
modelov in ugotovimo, da vsi preizkušeni modeli izkazujejo izrazito
pristranskost v prid modelu degradacije slike, uporabljenemu za generiranje
učne podatkovne zbirke za učenje superresolucije. Zaradi te pristranskosti
nobeden izmed preizkušenih modelov za izboljšavo kakovosti obraznih slik ni
sposoben sistematično izboljšati slik z vidika uporabnosti za razpoznavanje
obrazov, kadar gre za realne slike nizke ločljivosti in ne umetno podvzorčene.
Na podlagi te ugotovitve razvijemo novo metodo za razpoznavanje obrazov
iz slik nizke ločljivosti, ki temelji na rezultatih prej razvitega modela
za izboljšavo kakovosti slik nizke ločljivosti. Metoda temelji na združevanju
rezultatov superresolucije na več skalah in izpeljavi značilk s prednaučenimi
modeli za razpoznavanje obrazov. Z eksperimenti na podatkovni zbirki SCFace
pokažemo, da razvita metoda uspešno izrabi s strani modelov za izboljšavo kakovosti slik dodano informacijo in izboljša rezultate razpoznavanja
obrazov.Recently, significant advances in the field of automated face recognition have
been achieved using computer vision, machine learning, and deep learning
methodologies. However, despite claims of super-human performance of face
recognition algorithms on select key benchmark tasks, there remain several
open problems that preclude the general replacement of human face recognition
work with automated systems.
State-of-the-art automated face recognition systems based on deep learning
methods are able to achieve high accuracy when the face images they
are tasked with recognizing subjects from are of sufficiently high quality.
However, low image resolution remains one of the principal obstacles to face
recognition systems, and their performance in the low-resolution regime is decidedly
below human capabilities. In this PhD thesis, we present a systematic
study of modern automated face recognition systems in the presence of image
degradation in various forms. Based on our findings, we then propose a novel
technique for improving the quality of low-resolution face images. Specifically, we present a novel deep learning model architecture for image superresolution,
and a novel training procedure for face hallucination that trains
the model to super-resolve face images in a manner that preserves the information
about the subject identity present in the low-resolution image. We
validate the model by comparing its image reconstruction capability against
several state-of-the-art models, as well as its performance on downstream
semantic tasks including face recognition and face landmark localization.
Next, we study the generalization capabilities of super-resolution-based
face hallucination models, and find most of the models studied to be heavily
biased towards the articial image degradation process used to generate their
training datasets. We notice that due to this bias, none of the face hallucination
models considered are able to outperform an interpolation baseline
on face recognition benchmarks with real-life low resolution images.
To overcome this problem, we then develop a novel method for face recognition
from low-resolution images that uses the results of multi-scale face
hallucination models developed earlier. The proposed method is able to
benefit from the high-resolution information added by the face hallucination
models without suffering from the training set bias they exhibit, and systematically
outperform the interpolation baseline and other state-of-the-art
low-resolution face recognition models on the SCFace benchmark.
Our proposed methods are trained on large face image datasets in a manner
typical for deep learning models. However, the resulting trained models
are useful for face recognition applications in an open-set regime, and do not
need to be re-trained for novel subjects